Fast Gaussian Evaluations in Large Vocabulary Continuous Speech Recognition
نویسندگان
چکیده
Rapid advances in speech recognition theory, as well as computing hardware, have led to the development of machines that can take human speech as input, decode the information content of the speech, and respond accordingly. Real-time performance of such systems is often dominated by the evaluation of likelihoods in the statistical modeling component of the system. Statistical models are typically implemented using Gaussian mixture distributions. The primary objective of this thesis was to develop an extension of the Bucket Box Intersection algorithm in which the dimension with the optimal number of splits can be selected when multiple minima are present. The effects of normalization of mixture weights and Gaussian clipping have also been investigated. We show that the Extended BBI algorithm (EBBI) reduces run-time by 21% without introducing any approximation error. EBBI also produced a 12% lower word error rate than Gaussian clipping for the same computational complexity. These approaches were evaluated on a wide variety of tasks including conversational speech. I would like to dedicate this thesis to my parents and my husband for their constant support, encouragement and sacrifices throughout my education and career development. DEDICATION i ACKNOWLEDGMENTS First of all, I want to thank Joe Picone for introducing me to the area of speech recognition. He not only created my interest in this area but also inspired me to do my graduate research in speech recognition. He never got tired of my incessantly argumentative discussions and I always learned something valuable from those discussions. Having him as my mentor for my graduate studies has been a memorable experience for me. I would also like to thank Jon Hamaker for all the valuable suggestions and ideas he gave me for this research. I owe Jurgen Fritsch for answering any questions I had regarding the Bucket-Box Intersection algorithm which has been used as a baseline for this research. The environment at ISIP was very conducive to serious research and that has played a big role in my work. Excellent computing resources provided in the lab helped me a lot with the experiments. My years at ISIP will always be a sweet memory for me not only for the knowledge I gained there but also for giving me an opportunity to meet some very wonderful people. I deeply thank all my colleagues there for making those years so enjoyable in spite of the heavy work load we always had. Finally …
منابع مشابه
Spoken Term Detection for Persian News of Islamic Republic of Iran Broadcasting
Islamic Republic of Iran Broadcasting (IRIB) as one of the biggest broadcasting organizations, produces thousands of hours of media content daily. Accordingly, the IRIBchr('39')s archive is one of the richest archives in Iran containing a huge amount of multimedia data. Monitoring this massive volume of data, and brows and retrieval of this archive is one of the key issues for this broadcasting...
متن کاملA New Verification-based Fast Match Approach to Large Vocabulary Constinuous Speech Recognition
Acoustic fast match is usually used to accelerate search in large vocabulary continuous speech recognition. This paper discusses a new acoustic fast match algorithm. This proposed fast match is based on incremental evaluation of the score and the use of normalized likelihood scores. This is in contrast to more traditional fast matches where a likelihood score is used. In addition, streaming SIM...
متن کاملEfficient codebook for fast and accurate low resource ASR systems
Nowadays, speech interfaces have become widely employed in mobile devices, thus recognition speed and power consumption are becoming new metrics of Automatic Speech Recognition (ASR) performance. For ASR systems using continuous Hidden Markov Models (HMMs), the computation of the state likelihood is one of the most time consuming parts. Hence, we propose in this paper novel multi-level Gaussian...
متن کاملDesign of Fast Lvcsr Systems
This paper describes the development of fast (less than 10 times real-time) large vocabulary continuous speech recognition (LVCSR) systems based on technology developed for unlimited runtime systems assembled for participation in recent DARPA/NIST LVCSR evaluations. A general system structure for 10 times real-time systems is proposed and two specific systems that have been built for Broadcast ...
متن کاملEfficient codebooks for fast and accurate low resource ASR systems
Today, speech interfaces have become widely employed in mobile devices, thus recognition speed and resource consumption are becoming new metrics of Automatic Speech Recognition (ASR) performance. For ASR systems using continuous Hidden Markov Models (HMMs), the computation of the state likelihood is one of the most time consuming parts. In this paper, we propose novel multi-level Gaussian selec...
متن کاملMLLR method for Environmental Adaptation in a Continuous Farsi Speech Recognition
In this paper, MLLR adaptation of continuous density HMM is investigated in a Farsi speaker independent large vocabulary continuous speech recognition system in attempt to improve recognition rate in real world situations. In the MLLR framework, we have experienced the use of Gaussian mean transformations in global adaptation and regression tree based adaptation. Besides full and block-diagonal...
متن کامل